Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy
نویسندگان
چکیده
MOTIVATION Non-coding RNAs (ncRNAs)-functional RNA molecules not coding for proteins-are grouped into hundreds of families of homologs. To find new members of an ncRNA gene family in a large genome database, covariance models (CMs) are a useful statistical tool, as they use both sequence and RNA secondary structure information. Unfortunately, CM searches are slow. Previously, we introduced 'rigorous filters', which provably sacrifice none of CMs' accuracy, although often scanning much faster. A rigorous filter, using a profile hidden Markov model (HMM), is built based on the CM, and filters the genome database, eliminating sequences that provably could not be annotated as homologs. The CM is run only on the remainder. Some biologically important ncRNA families could not be scanned efficiently with this technique, largely due to the significance of conserved secondary structure relative to primary sequence in identifying these families. Current heuristic filters are also expected to perform poorly on such families. RESULTS By augmenting profile HMMs with limited secondary structure information, we obtain rigorous filters that accelerate CM searches for virtually all known ncRNA families from the Rfam Database and tRNA models in tRNAscan-SE. These filters scan an 8 gigabase database in weeks instead of years, and uncover homologs missed by heuristic techniques to speed CM searches. AVAILABILITY Software in development; contact the authors.
منابع مشابه
Supplement to: Exploiting Conserved Structure for Faster Annotation of Non-coding RNAs Without Loss of Accuracy
This paper adds supplementary technical information to the paper “Exploiting Conserved Structure for Faster Annotation of Non-coding RNAs Without Loss of Accuracy” [3]. In particular, a fully automated scheme to design efficient rigorous filters is described, and some other technical issues avoided in the paper are discussed here. This supplement assumes the reader has read that paper, and is f...
متن کاملEvaluation of the role of mico-RNAs in cardiomyocytes differentiation of mesenchymal stem cells
Stem cells are a good alternative for regenerative medicine because of their characteristics such as self-renewal and differentiation potential. They are classified into different types of stem cells including embryonic stem cells, induced pluripotent stem cells, multipotent stem cells, and ultimately uni-potent stem cells. Mesenchymal stem cells extracted from adult tissues. Due to the lack of...
متن کاملLong non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کاملImplementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss
Introduction: The precision and time required for analysis of data in next-generation sequencing (NGS) depends on many factors including the tools utilized for alignment, variant calling, annotation and filtering of variants, personnel expertise in data analysis and interpretation, and computational capacity of the lab and its optimization is a challenging task. Method: An application software...
متن کاملImplementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss
Introduction: The precision and time required for analysis of data in next-generation sequencing (NGS) depends on many factors including the tools utilized for alignment, variant calling, annotation and filtering of variants, personnel expertise in data analysis and interpretation, and computational capacity of the lab and its optimization is a challenging task. Method: An application software...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 Suppl 1 شماره
صفحات -
تاریخ انتشار 2004